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The PI of this project was Jeff Scargle of NASA/Ames. Co-I's were Alanna Connors of Eureka 
Scientific/Wellesley, and myself. Part of the work was subcontracted to Eureka Scientific via SAO, with 
Vinay Kashyap as PI. This project was originally assigned grant number NCC2-1206, and was later 
changed to NCC2-1350 for administrative reasons. 

The goal of the project was to obtain, derive, and develop statistical and data analysis tools that would be of 
use in the analyses of high-resolution, high-sensitivity data that are becoming available with new 
instruments. This is envisioned as a cross-disciplinary effort with a number of "collaborators" including 
some at SAO (Aneta Siemiginowska, Peter Freeman) and at the Harvard Statistics department (David van 
Dyk, Rostislav Protassov, Xiao-li Meng, Epaminondas Sourlas, et al). 

We have developed a new tool to reliably measure the metallicities of thermal plasma. It is unfeasible to 
obtain high-resolution grating spectra for most stars, and one must make the best possible determination 
based on lower-resolution, CCD-type spectra. It has been noticed that most analyses of such spectra have 
resulted in measured metallicities that were significantly lower than when compared with analyses of high- 
resolution grating data where available (see, e.g., Brickhouse et al., 2000, ApJ 530, 387). Such results have 
led to the proposal of the existence of so-called Metal Abundance Deficient, or "MAD” stars (e.g., Drake, 
J.J., 1996, Cool Stars 9, ASP Conf.Ser. 109, 203). We however find that much of these analyses may be 
systematically underestimating the metallicities, and using a newly developed method to correctly treat the 
low-counts regime at the high-energy tail of the stellar spectra (van Dyk et al. 2001, ApJ 548, 224), have 
found that the metallicities of these stars are generally comparable to their photospheric values. The results 
were reported at the AAS (Sourlas, Yu, van Dyk, Kashyap, and Drake, 2000, BAAS 196, v32, #54.02), and 
at the conference on Statistical Challenges in Modem Astronomy (Sourlas, van Dyk, Kashyap, Drake, and 
Pease, 2003, SCMA III, Eds. E.D.Feigelson, GJ.Babu, New York:Springer, p489-490). 

We also described the limitations of one of the most egregiously misused and misapplied statistical tests in 
astrophysical literature, the F-test for verifying model components (Protassov, van Dyk, Connors, Kashyap, 
and Siemiginowska, 2002, ApJ, 571, 545). Indeed, a search through the ApJ archives turned up 170 papers 
in the 5 previous years that used the F-test explicitly in some form or the other, and with the vast majority 
of them not using it correctly! Indeed, looking at just 4 issues of the ApJ in 2001, we found 13 instances of 
its use, of which nine were demonstrably incorrect. Clearly, it is difficult to understate the importance of 
this issue. 

We also worked on speeding up Bayes Blocks and Sparse Bayes Blocks algorithms to make them more 
tractable for large searches. We also supported sttistics students and postdocs in both explicit physics- 
model-based (spectra with tens of thousands of atomic lines) and "model-free" — i.e. non-parametric or 
semi-parametric — algorithms. Work on using more of the latter is just beginning; while using multi-scale 
methods for Poisson imaging has come to fruition. In fact, "An Image Restoration Technique with Error 
Estimates", by D. Esch, A. Connors, M. Karovska, and D. van Dyk, was published by ApJ (Esch et al.2004, 
ApJ, 610, 1213). The code has been delivered to M. Karovska for CXC; and is available for beta-testing 
upon request. 

The other large project we worked on was on the self-consistent modeling of logN-logS curves in the 
Poisson limit. logN-logS curves are a fundamental tool in the study of source populations, luminosity 
functions, and cosmological parameters. However, their determination is hampered by statistical effects 
such as the Eddington bias, incompleteness due to detection efficiency, faint source flux fluctuations, etc. 
We have developed a new and powerful method using the full Poisson machinery that allows us to model 
the logN-logS distribution of X-ray sources in a self-consistent manner. Because we properly account for 
all the above statistical effects, our modeling is valid over the full range of the data, and not just for strong 
sources, as is normally done. Using a Bayesian approach and modeling the fluxes with known functional 
forms such as simple or broken power-laws, and conditioning the expected photon counts on the fluxes, the 
background contamination, effective area, detector vignetting, and detection probability, we can delve 
deeply into the low counts regime and extend the usefulness of medium sensitivity surveys such as ChAMP 
by orders of magnitude. The built-in flexibility of the algorithm also allows a simultaneous analysis of 
multiple datasets. We have applied this analysis to a set a Chandra observations (Sourlas, Kashyap, Zezas, 
van Dyk, 2004, HEAD #8, #16.32) 



One of the goals of the project is also to alert astrophysicists to the existence of good and viable methods to 
analyze data, and to educate them on the manner of using these methods. Towards that end, we organized a 
number of special sessions at AAS and HEAD conferences. The following is a list of such special sessions: 

1] AAS 197: "New Results from ”Back-to-Basics" Data Analysis: Special Tutorials on Timing and 
Fitting", headed by an invited talk by Larry Bretthorst. 

http://wwwgro.unh.edu/users/aconnors/astrostatyaas 1 97/index.html 

2] AAS 199: "Data Analysis Challenges in Solar and Stellar Astrophysics" 
http://wwwgro.unh.edu/users/aconnors/astrostat/aas 1 99/index.html 

3] AAS 201 : "Making it Work: Principled ''Model Free Deconvolution" via Multi-scale Methods" 
http://www.aas.org/publications/baas/v34n4/aas201/S630.htm 

4] In January 2003, we organized a workshop at the Harvard-Smithsonian Center of Astrophysics, 
Cambridge, MA, on "Current Challenges in Multi-Scale Analysis" 

http://www.ics.uci.edu/~dvd/MultScaleConf/ 

5] In June 2003, we organized a special session at the 34th meeting of the Solar Physics Division of the 
AAS at Laurel, MD, on "Data Analysis Challenges in Solar and Stellar Astrophysics" 

6] In September 2003, the 7th Workshop on Case Studies of Bayesian Statistics at Carnegie Mellon 
University, Pittsburgh, PA, featured the work of our group: "Highly Structured Models for Spectral and 
Image Analysis in High Energy Astrophysics" with speakers David van Dyk, David Esch, Yarning Yu, 
Margarita Karovska, Vinay Kashyap, Peter Freeman, Aneta Siemiginowska and Alanna Connors. The 
speakers got both useful exposure in and feedback from the wider statistics community. There was also an 
interesting SLAC conference, "Statistical Problems in Particle Physics, Astrophysics, And Cosmology, Sep 
8-11, 2003. As E. Feigelson would predict, it was an odd combination of sophisticated scientists clinging to 
traditional but inappropriate techniques (taking square-root of counts doesn't work for small numbers or far 
out in the tail, etc) and exciting new work on both uses of Bayesian methods, and layered, sophisticated, 
more general likelihood-based techniques at the workshop on Case Studies in Bayesian Analysis at Penn 
State. 

7] In September 2004, we organized a special session at the meeting of the High Energy Division of the 
AAS at New Orleans, LA, on "Data Analysis Challenges in Astrophysics" 

http://hea-www.harvard.edu/AstroStat/HEAD2004/ 

It must be noted that we are now faced with a challenge: some of our newer AAS special session and 
topical sessions were rejected. We were told that competition is becoming more fierce and apparently we 
are at a disadvantage because of having had so many successful workshops in the past! This points to a 
need to ask help of AAS committees on how to make AISRP and other "tool" projects more visible. 


